NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Anyone can be the best: Impact of diverse methodologies on the evaluation of structural variant callers

https://doi.org/10.1101/2025.08.28.672546

Denti, Luca; Krannich, Thomas; Vinar, Tomas; Chikhi, Rayan; Bonizzoni, Paola; Brejova, Brona; Hormozdiari, Fereydoun (September 2025, bioRxiv)

Abstract Structural variants (SVs) are medium and large-scale genomic alterations that shape phenotypic diversity and disease risk. Numerous methods have been proposed for discovering SVs, however their benchmarking has been inconsistent across studies, often resulting in contradictory findings. One of the main sources of conflicting evaluation re-sults is the lack of consistency in the SV callsets used as ground truth, ranging from curated callsets released by consortia to more recent approaches that construct callsets from high-quality telomere-to-telomerede novohaplotype assemblies. The discrepancies between benchmarks are further compounded by the choice of the reference genome (GRCh37,GRCh38, andT2T-CHM13), where usingT2T-CHM13reveals a different deletion/insertion profile, indicating reduced reference bias. We evaluated the performance of several state-of-the-art SV discovery methods from long-read whole-genome sequencing data and observed substantial variation in their performance and rankings, depending on the choice of ground truth, reference genome, and genomic regions used for evaluation. Counter-intuitively, the more complete reference genomeT2T-CHM13does not inherently solve the problem of SV benchmarking; instead it reveals the limitations of each detection method in complex genomic regions. The substantial variation in detection accuracy across different genomic regions calls for additional caution in downstream analyses and in drawing conclusions based on predicted SVs. These findings underscore the complexity of evaluating SV detection methods and highlight the need for careful consideration and, ideally, field-standard best practices when reporting performance metrics.
more » « less
Full Text Available
Efficient Analysis of Annotation Colocalization Accounting for Genomic Contexts

Gafurov, Askar; Vinar, Tomas; Medvedev, Paul; Brejova, Brona (May 2024, Springer)

Full Text Available
Pangenome graph augmentation from unassembled long reads

https://doi.org/10.1101/2025.02.07.637057

Denti, Luca; Bonizzoni, Paola; Brejova, Brona; Chikhi, Rayan; Krannich, Thomas; Vinar, Tomas; Hormozdiari, Fereydoun (February 2025, bioRxiv)

Abstract Pangenomes are becoming increasingly popular data structures for genomics analyses due to their ability to compactly represent the genetic diversity within populations. Constructing a pangenome graph, however, is still a time-consuming and expensive process. A promising approach for pangenome construction consists of progressively augmenting a pangenome graph with additional high-quality assemblies. Currently, there is no method for augmenting a pangenome graph with unassembled reads from newly sequenced samples without first aligning the reads to a reference genome and performing variant calling and genotyping on the new individuals. In this work, we present the first assembly-free and mapping-free approach for augmenting an existing pangenome graph using unassembled long reads from an individual not already present in the pangenome. Our approach consists of finding sample specific sequences in reads using efficient indexes, clustering reads corresponding to the same novel variant(s), and then building a consensus sequence to be added to the pangenome graph for each variant separately. Using simulated reads based on Human Pangenome Reference Consortium (HPRC) assemblies, we demonstrate the effectiveness of the proposed approach for progressively augmenting the pangenome with long reads, without the need forde novoassembly or predicting genetic variants of the new sample. The software is freely available athttps://github.com/ldenti/palss.
more » « less
Full Text Available

Search for: All records